49 research outputs found
Time-Varying Gaussian Process Bandit Optimization
We consider the sequential Bayesian optimization problem with bandit
feedback, adopting a formulation that allows for the reward function to vary
with time. We model the reward function using a Gaussian process whose
evolution obeys a simple Markov model. We introduce two natural extensions of
the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The
first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB,
instead forgets about old data in a smooth fashion. Our main contribution
comprises of novel regret bounds for these algorithms, providing an explicit
characterization of the trade-off between the time horizon and the rate at
which the function varies. We illustrate the performance of the algorithms on
both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB
to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover,
both algorithms significantly outperform classical GP-UCB, since it treats
stale and fresh data equally.Comment: To appear in AISTATS 201
Robust Submodular Maximization: A Non-Uniform Partitioning Approach
We study the problem of maximizing a monotone submodular function subject to
a cardinality constraint , with the added twist that a number of items
from the returned set may be removed. We focus on the worst-case setting
considered in (Orlin et al., 2016), in which a constant-factor approximation
guarantee was given for . In this paper, we solve a key
open problem raised therein, presenting a new Partitioned Robust (PRo)
submodular maximization algorithm that achieves the same guarantee for more
general . Our algorithm constructs partitions consisting of
buckets with exponentially increasing sizes, and applies standard submodular
optimization subroutines on the buckets in order to construct the robust
solution. We numerically demonstrate the performance of PRo in data
summarization and influence maximization, demonstrating gains over both the
greedy algorithm and the algorithm of (Orlin et al., 2016).Comment: Accepted to ICML 201
Adversarially Robust Optimization with Gaussian Processes
In this paper, we consider the problem of Gaussian process (GP) optimization
with an added robustness requirement: The returned point may be perturbed by an
adversary, and we require the function value to remain as high as possible even
after this perturbation. This problem is motivated by settings in which the
underlying functions during optimization and implementation stages are
different, or when one is interested in finding an entire region of good inputs
rather than only a single point. We show that standard GP optimization
algorithms do not exhibit the desired robustness properties, and provide a
novel confidence-bound based algorithm StableOpt for this purpose. We
rigorously establish the required number of samples for StableOpt to find a
near-optimal point, and we complement this guarantee with an
algorithm-independent lower bound. We experimentally demonstrate several
potential applications of interest using real-world data sets, and we show that
StableOpt consistently succeeds in finding a stable maximizer where several
baseline methods fail.Comment: Corrected typo
Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation
We present a new algorithm, truncated variance reduction (TruVaR), that
treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian
processes in a unified fashion. The algorithm greedily shrinks a sum of
truncated variances within a set of potential maximizers (BO) or unclassified
points (LSE), which is updated based on confidence bounds. TruVaR is effective
in several important settings that are typically non-trivial to incorporate
into myopic algorithms, including pointwise costs and heteroscedastic noise. We
provide a general theoretical guarantee for TruVaR covering these aspects, and
use it to recover and strengthen existing results on BO and LSE. Moreover, we
provide a new result for a setting where one can select from a number of noise
levels having associated costs. We demonstrate the effectiveness of the
algorithm on both synthetic and real-world data sets.Comment: Accepted to NIPS 201
Robust Adaptive Decision Making: Bayesian Optimization and Beyond
The central task in many interactive machine learning systems can be formalized as the sequential optimization of a black-box function. Bayesian optimization (BO) is a powerful model-based framework for \emph{adaptive} experimentation, where the primary goal is the optimization of the black-box function via sequentially chosen decisions. In many real-world tasks, it is essential for the decisions to be \emph{robust} against, e.g., adversarial failures and perturbations, dynamic and time-varying phenomena, a mismatch between simulations and reality, etc. Under such requirements, the standard methods and BO algorithms become inadequate. In this dissertation, we consider four research directions with the goal of enhancing robust and adaptive decision making in BO and associated problems.
First, we study the related problem of level-set estimation (LSE) with Gaussian Processes (GPs). While in BO the goal is to find a maximizer of the unknown function, in LSE one seeks to find all "sufficiently good" solutions. We propose an efficient confidence-bound based algorithm that treats BO and LSE in a unified fashion. It is effective in settings that are non-trivial to incorporate into existing algorithms, including cases with pointwise costs, heteroscedastic noise, and multi-fidelity setting. Our main result is a general regret guarantee that covers these aspects.
Next, we consider GP optimization with robustness requirement: An adversary may perturb the returned design, and so we seek to find a robust maximizer in the case this occurs. This requirement is motivated by, e.g., settings where the functions during optimization and implementation stages are different. We propose a novel robust confidence-bound based algorithm. The rigorous regret guarantees for this algorithm are established and complemented with an algorithm-independent lower bound. We experimentally demonstrate that our robust approach consistently succeeds in finding a robust maximizer while standard BO methods fail.
We then investigate the problem of GP optimization in which the reward function varies with time. The setting is motivated by many practical applications in which the function to be optimized is not static. We model the unknown reward function via a GP whose evolution obeys a simple Markov model. Two confidence-bound based algorithms with the ability to "forget" about old data are proposed. We obtain regret bounds for these algorithms that jointly depend on the time horizon and the rate at which the function varies.
Finally, we consider the maximization of a set function subject to a cardinality constraint in the case a number of items from the returned set may be removed. One notable application is in batch BO where we need to select experiments to run, but some of them can fail. Our focus is on the worst-case adversarial setting, and we consider both \emph{submodular} (i.e., satisfies a natural notion of diminishing returns) and \emph{non-submodular} objectives. We propose robust algorithms that achieve constant-factor approximation guarantees. In the submodular case, the result on the maximum number of allowed removals is improved to in comparison to the previously known . In the non-submodular case, we obtain new guarantees in the support selection and batch BO tasks. We empirically demonstrate the robust performance of our algorithms in these, as well as, in data summarization and influence maximization tasks
Near-Optimally Teaching the Crowd to Classify
How should we present training examples to learners to teach them
classification rules? This is a natural problem when training workers for
crowdsourcing labeling tasks, and is also motivated by challenges in
data-driven online education. We propose a natural stochastic model of the
learners, modeling them as randomly switching among hypotheses based on
observed feedback. We then develop STRICT, an efficient algorithm for selecting
examples to teach to workers. Our solution greedily maximizes a submodular
surrogate objective function in order to select examples to show to the
learners. We prove that our strategy is competitive with the optimal teaching
policy. Moreover, for the special case of linear separators, we prove that an
exponential reduction in error probability can be achieved. Our experiments on
simulated workers as well as three real image annotation tasks on Amazon
Mechanical Turk show the effectiveness of our teaching algorithm
Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach
We study the classical problem of maximizing a monotone submodular function
subject to a cardinality constraint k, with two additional twists: (i) elements
arrive in a streaming fashion, and (ii) m items from the algorithm's memory are
removed after the stream is finished. We develop a robust submodular algorithm
STAR-T. It is based on a novel partitioning structure and an exponentially
decreasing thresholding rule. STAR-T makes one pass over the data and retains a
short but robust summary. We show that after the removal of any m elements from
the obtained summary, a simple greedy algorithm STAR-T-GREEDY that runs on the
remaining elements achieves a constant-factor approximation guarantee. In two
different data summarization tasks, we demonstrate that it matches or
outperforms existing greedy and streaming methods, even if they are allowed the
benefit of knowing the removed subset in advance.Comment: To appear in NIPS 201
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning
In real-world tasks, reinforcement learning (RL) agents frequently encounter
situations that are not present during training time. To ensure reliable
performance, the RL agents need to exhibit robustness against worst-case
situations. The robust RL framework addresses this challenge via a worst-case
optimization between an agent and an adversary. Previous robust RL algorithms
are either sample inefficient, lack robustness guarantees, or do not scale to
large problems. We propose the Robust Hallucinated Upper-Confidence RL
(RH-UCRL) algorithm to provably solve this problem while attaining near-optimal
sample complexity guarantees. RH-UCRL is a model-based reinforcement learning
(MBRL) algorithm that effectively distinguishes between epistemic and aleatoric
uncertainty and efficiently explores both the agent and adversary decision
spaces during policy learning. We scale RH-UCRL to complex tasks via neural
networks ensemble models as well as neural network policies. Experimentally, we
demonstrate that RH-UCRL outperforms other robust deep RL algorithms in a
variety of adversarial environments